8 research outputs found
Shape Modeling with Spline Partitions
Shape modelling (with methods that output shapes) is a new and important task
in Bayesian nonparametrics and bioinformatics. In this work, we focus on
Bayesian nonparametric methods for capturing shapes by partitioning a space
using curves. In related work, the classical Mondrian process is used to
partition spaces recursively with axis-aligned cuts, and is widely applied in
multi-dimensional and relational data. The Mondrian process outputs
hyper-rectangles. Recently, the random tessellation process was introduced as a
generalization of the Mondrian process, partitioning a domain with non-axis
aligned cuts in an arbitrary dimensional space, and outputting polytopes.
Motivated by these processes, in this work, we propose a novel parallelized
Bayesian nonparametric approach to partition a domain with curves, enabling
complex data-shapes to be acquired. We apply our method to HIV-1-infected human
macrophage image dataset, and also simulated datasets sets to illustrate our
approach. We compare to support vector machines, random forests and
state-of-the-art computer vision methods such as simple linear iterative
clustering super pixel image segmentation. We develop an R package that is
available at
\url{https://github.com/ShufeiGe/Shape-Modeling-with-Spline-Partitions}
Random Tessellation Forests
Space partitioning methods such as random forests and the Mondrian process
are powerful machine learning methods for multi-dimensional and relational
data, and are based on recursively cutting a domain. The flexibility of these
methods is often limited by the requirement that the cuts be axis aligned. The
Ostomachion process and the self-consistent binary space partitioning-tree
process were recently introduced as generalizations of the Mondrian process for
space partitioning with non-axis aligned cuts in the two dimensional plane.
Motivated by the need for a multi-dimensional partitioning tree with non-axis
aligned cuts, we propose the Random Tessellation Process (RTP), a framework
that includes the Mondrian process and the binary space partitioning-tree
process as special cases. We derive a sequential Monte Carlo algorithm for
inference, and provide random forest methods. Our process is self-consistent
and can relax axis-aligned constraints, allowing complex inter-dimensional
dependence to be captured. We present a simulation study, and analyse gene
expression data of brain tissue, showing improved accuracies over other
methods.Comment: 11 pages, 4 figure
Genome-Wide Association with Uncertainty in the Genetic Similarity Matrix
Genome-wide association studies (GWASs) are often confounded by population stratification and structure. Linear mixed models (LMMs) are a powerful class of methods for uncovering genetic effects, while controlling for such confounding. LMMs include random effects for a genetic similarity matrix, and they assume that a true genetic similarity matrix is known. However, uncertainty about the phylogenetic structure of a study population may degrade the quality of LMM results. This may happen in bacterial studies in which the number of samples or loci is small, or in studies with low-quality genotyping. In this study, we develop methods for linear mixed models in which the genetic similarity matrix is unknown and is derived from Markov chain Monte Carlo estimates of the phylogeny. We apply our model to a GWAS of multidrug resistance in tuberculosis, and illustrate our methods on simulated data
Statistical machine learning in computational genetics
Statistical machine learning has played a key role in many areas, such as biology, health sciences, finance and genetics. Important tasks in computational genetics include disease prediction, capturing shapes within images, computation of genetic sharing between pairs of individuals, genome-wide association studies and image clustering. This thesis develops several learning methods to address these computational genetics problems. Firstly, motivated by the need for fast computation of genetic sharing among pairs of individuals, we propose the fastest algorithms for computing the kinship coefficient of a set of individuals with a known large pedigree. {Moreover, we consider the possibility that the founders of the known pedigree may themselves be inbred and compute the appropriate inbreeding-adjusted kinship coefficients, which has not been addressed in literature.} Secondly, motivated by an imaging genetics study of the Alzheimer\u27s Disease Neuroimaging Initiative, we develop a Bayesian bivariate spatial group lasso model for multivariate regression analysis applicable to exam the influence of genetic variation on brain structure and accommodate the correlation structures typically seen in structural brain imaging data. We develop a mean-field variational Bayes algorithm and a Gibbs sampling algorithm to fit the model. We also incorporate Bayesian false discovery rate procedures to select SNPs. The new spatial model demonstrates superior performance over a standard model in our application. Thirdly, we propose the Random Tessellation Process (RTP) to model complex genetic data structures to predict disease status. The RTP is a multi-dimensional partitioning tree with non-axis aligned cuts. We develop a sequential Monte Carlo (SMC) algorithm for inference. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. Fourthly, we propose the Random Tessellation with Splines (RTS) to acquire complex shapes within images. The RTS provides a framework for describing Bayesian nonparametric models based on partitioning two-dimensional Euclidean space with splines. We also develop an inference algorithm that is "embarrassingly parallel". Finally, we extend the mixtures of spatial spline regression with mixed-effects model under the Bayesian framework to accommodate streaming image data. We propose an SMC algorithm to analyze online fashion brain image
Energy and Operation Characteristics of Electric Excavator With Innovative Hydraulic-Electric Dual Power Drive Boom System
In the existing electric excavators, the energy efficiency of the hydraulic system is less than 30% due to a large amount of throttling loss and waste of potential energy. In order to improve excavator energy efficiency, an electric excavator scheme using a hydraulic-electric dual-power drive boom system is proposed. A linear actuator, including electro-mechanical unit and hydraulic unit, was adopted in the boom system. The boom velocity is controlled by the electro-mechanical unit instead of hydraulic valve to reduce throttling loss. The non-rod chamber of the linear actuator is connected to a hydraulic accumulator to reutilize boom gravitational potential energy. In addition, when the boom and other devices are operated together, the throttling loss caused by load difference of multi-actuators can be reduced because the linear actuator can compensate for the pump pressure. The working principle, control strategy, and characteristics of the proposed electric excavator were analyzed through theory and experiments. The results show that the proposed system can reduce throttling loss and efficiently reutilize the boom gravitational potential energy. During boom lifting and lowering process, the reutilization rate of the boom gravitational potential energy is 67.6%, and the energy consumption is reduced by 66.1%. During land levelling process, the throttling loss of the electric excavator is reduced by 49.6% and the energy consumption is reduced by 38.1%. The research results will provide new methods for the electrification of construction machinery